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ABSTRACT 


The structure of a community is a crucial element in understanding complex networks. It provides valuable 
insights into both the arrangement and function of the network, aiding our comprehension of dynamic phe- 
nomena like epidemics and information propagation. While the Label Propagation Algorithm (LPA) is 
widely recognized for community detection due to its linear time complexity, it has a notable drawback. In 
comparison with other algorithms label propagation has advantages in its running time and 
amount of a priori information needed about the network structure. The biggest advantage of label 
propagation algorithm is that it owns excellent running time as well as simple algorithm process. LPA gen- 
erates unstable community assignments, resulting in different combinations of communities with each exe- 
cution on the same network. This unpredictability fosters instability and the emergence of large, less in- 
formative communities. To overcome these drawbacks, an Enhanced Community Detection approach, a 
combination of the Label Propagation Algorithm and Ant Colony Optimization (ECDLPA-ACO) technique 
has been proposed in this paper. ECDLPA-ACO not only propagates labels but also optimizes modularity 
measures by clustering similar vertices based on local similarities within the network. Experimental results 
on established social network datasets showcase the superiority of ECDLPA-ACO over comparable com- 
munity detection algorithms like Louvain Algorithm, Infomap Algorithm, and traditional Label Propaga- 
tion Algorithm. ECDLPA-ACO outshines in scalability, average execution time, modularity, and computa- 
tional efficiency. 

Keywords: Community Detection, Louvain Algorithm, Infomap Algorithm, Label Propagation Algorithm, 

Ant Colony Optimization technique 


1. INTRODUCTION within the group but sparse between groups [4]. 


Community detection stands as a significant as- 


Recently, Social Networks (SN) has gained im- 
mense popularity for their role in facilitating dig- 
ital-era interactions [1]. Broadly, a social net- 
work is portrayed as a graph, defined as a system 
of interactions or relationships. In this represen- 
tation, nodes symbolize individuals (actors), and 
edges signify connections or interactions be- 
tween them [2]. The rise of platforms like 
YouTube, Facebook, Pinterest, Etsay, Twitter 
network, and others has propelled the analysis of 
network data into a critical research domain [3]. 


In the realm of social networks, communities are 
clusters of nodes where connections are dense 


pect of Social Network Analysis (SNA), drawing 
substantial attention [5]. This process, deeply 
rooted in sociology, has now found widespread 
popularity in computational intelligence. Com- 
munity detection plays a pivotal role in exploring 
diverse domains such as urbanization, social 
marketing, criminology, and beyond [6]. In the 
era of online social networks, the study of com- 
munity detection has become integral to targeted 
marketing and influential campaigning [7]. It 
facilitates the analysis of intricate networks, al- 
lowing for the visualization of structures and the 
extraction of relationships. As the scope of com- 
plex networks, including social networks, con- 
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tinues to expand, researchers are uncovering the 
mechanics of epidemic spreading through the 
analysis of community structures in complex 
networks [8]. The Label Propagation Algorithm 
is favored for its efficiency, scalability, flexibil- 
ity, noise tolerance, parameter-free nature, ability 
to handle overlapping communities, and adaptive 
characteristics, making it a valuable tool for 
community detection in a wide range of net- 
works. 

Ant Colony Optimization offers an effective and 
adaptable approach to community detection, 
providing robustness to noise, and scalability to 
large networks. 


The major problems observed in community de- 

tection are, 

e Modularity maximization suffers from reso- 
lution limits and extreme degeneracy. 

e Finding the partition with maximal modular- 
ity is a challenging NP-complete problem. 

e LPA has an uncertainty and randomness in 
the label propagation process, which may af- 
fect the stability and accuracy of community 
detection. 

e Heuristics algorithms have been introduced 
to approximate the optimal partition but can 
be improved. 

e Large-scale-free graphs with hubs pose 
scalability issues in community detection in 
distributed environments 

e Modularity optimization algorithms are the 
only approach for detecting communities in 
large graphs [9]. However, their perfor- 
mance is weakened by the resolution limit. 

The proposed ECDLPA-ACO algorithm has 
been developed with an objective to address the 
above problems. The working of the algorithm as 
follows: 
ECDLPA-ACO extracts initial communities and 
iteratively updates node labels based on adjacent 
nodes. Fully connected sub graphs are detected 
and merged using defined degree functions. 
Communities are extracted based on node labels, 
and the method is explained in detail. The pro- 
posed Enhanced Community Detection method 
using Label Propagation Algorithm with ACO 
(ECDLPA-ACO) is compared with other algo- 
rithms. Assessing the method through modularity 
and computation time reveals positive outcomes 
across both small and large graphs. Particularly 
for very large networks, ECDLPA-ACO emerges 
as the superior option, thanks to its efficient 
computational time. 


2. RELATED WORKS 


Community detection stands as a foundation- 
al endeavor within social network analysis, with 
the primary objective of identifying clusters of 
nodes in a network exhibiting dense internal 
connections but sparse connections with other 
clusters. Three widely recognized algorithms for 
community detection include the Louvain meth- 
od (Louvain modularity optimization), the Info- 
map algorithm, and the Label Propagation Algo- 
rithm. The subsequent explanation of each of 
these algorithms provides insight into their role 
in the context of community detection within 
social networks. 


The Louvain algorithm (LV) [10], extracts 
community structure in vast networks. Based on 
modularity optimization, it employs a heuristic 
approach, with two iterative phases. In the first 
phase, nodes join communities, optimizing mod- 
ularity by moving based on gains. The second 
phase builds a new network with attributed 
communities, linked by computed weights [11]. 
LV efficiently detects communities in large net- 
works by optimizing modularity. LV offers 
speed, scalability, hierarchical structure, yet 
quality depends on network. One needs to con- 
sider size, granularity, efficiency, assumptions 
and pre processing for best results. Scalability 
issues arise, particularly in detecting communi- 
ties in distributed environments with large-scale- 
free graphs. 


Infomap algorithm starts by assigning nodes 
separate communities. Sequential random moves 
to neighbors follow, reducing map equation. It- 
erations continue till no further reductions. 
Graph is rebuilt, last-tier communities are re- 
placed by nodes. Cycle repeats for new level till 
no more reductions. Infomap algorithm detects 
communities using info theory, optimizing walk 
encoding [12][13]. The results of Infomap algo- 
rithm are balanced partition graph, minimal de- 
scription, inflow and outflow of a vertex. The 
drawback of infomap algorithm is lower compu- 
tational time. 

The Label Propagation Algorithm (LPA) detects 
communities in large networks with linear 
runtime for low-density networks. Nodes start 
with exclusive labels, adopting prevalent labels 
from neighbors iteratively. Linked nodes with 
identical labels form communities. The ad- 
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vantages of LPA includes (1) no parameter need- Setup 
ed, (11) simple and fast, and (iii) structure-based 
[14][15]. The limitations of LPA includes (i) sen- 
sitive to initial labels, (11) performance varies and 

(111) noise impact. The drawback of LPA is that it 
returns different solutions (some of them of poor 
quality) in different realizations and lower com- 
putational time. Reconstruction 

graph 
3. ENHANCED COMMUNITY DETECTION 
USING ECDLPA-ACO 


1, 2, 3}{1, 2, 4}{1, 3, 
4} {2, 3, 44, 3, 
5}{5, 6, 7} 


In this research work, we combined the Label 
Propagation Algorithm with the Ant Colony Op- 
timization Algorithm [16], making certain modi- 
fication to enhance its community detection ca- 
pabilities. Our approach, known as ECDLPA- 
ACO, operates within the realm of social net- a 
works, delving into the exploration of concealed 

connections among individuals. This innovative 
algorithm is adept at uncovering all potential 


Common 
overlapping communities within the network (>) 


graph. The undertaking unfolds in three distinct Evaluating 
communi- 


phases: ties 


Initialization and Setup: The initial stage in- eae 
volves the setup process, where a series of algo- update 

rithms are primed for action. This entails the se- 
lection of pertinent social network datasets, fine- 
tuning algorithmic parameters, and the creation ee 


of a unified graph model derived from the cho- Evaluation metrics 
sen datasets. 


‘ . {1, 2, 3, 4, 5, 6, 7} 
Construction of Community Detection Frame- 
Modulari- Computation- 

work: The second step centers on the develop- ty al Time 

ment of a community detection framework, char- eee 
acterized by a comprehensive, abstracted ap- 

proach to community detection workflows. Here, 

we begin with the establishment of an initial 
community structure based on the dataset. Sub- 


sequently, nodes of interest are identified, lead- Figure 1: Structure of Proposed ECDLPA-ACO 
ing to a reconstruction of the graph. 


Common nodes 


Further, the algorithm discerns between classi- 
fied and common nodes. The proposed method- 
ology is then applied to assess communities, with 
an emphasis on maximizing modularity by real- 
locating vertices to their neighboring communi- 
ties. 


Community Extraction: The last stage involves 
extracting communities with the help of a thor- 
ough evaluation system that takes into account 
various aspects of community detection. We pre- 
sent a hybrid method that combines the Label 
Propagation Algorithm and Ant Colony Optimi- 
zation Algorithm (ACO). In this work, the 
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graph's edges are weighted based on similarity 
indices, boosting the algorithm's effectiveness in 
community detection. 


This algorithm leverages Ant Colony Optimiza- 
tion to effectively calculate link rankings. It does 
so by computing pair wise distances between 
nodes and subsequently partitioning the network. 
It then endeavors to propagate labels and identify 
communities within networks through local op- 
timization of modularity measures with individ- 
ual ants. Figure 1 illustrates the structure of the 
Proposed ECDLPA-ACO algorithm. 


Proposed ECDLPA-ACO algorithm 

Input: Network G = (V, E), with a maximum 
number of iterations denoted as "master." 
Output: Community set setc = {cl, ..., ck}, and 


the number of communities "k". 


Step 1: Assign a unique label to each node in the 
network. The structural influence, tsi(i, j), signi- 
fies the impact of node i on node j. If (i, j) is an 
edge in the network, calculate tsi(i, }) as the ratio 
of the actual number of connections from node i 
to the neighbors of j to the maximum possible 


connections. Therefore, tsi(i, j) is defined as: 
1+ | TMOT()) (1) 

IT) | 
Where T(i) represents the neighbor set of node 1, 
deg(i) denotes the number of neighbors of node 1 
(degree of 1), and |T(i) Q TG)| signifies the count 
of shared neighbors between node i and j, essen- 
tially indicating the number of triangles connect- 
ing them. If node i is linked to all of node j's 
neighbors, it implies a substantial influence from 
i to j. It is crucial to note that the tsi from node i 
to node j may differ from the tsi from node j to 
node i. Therefore, the definition of tsi (i, J) 1s as 
follows: 


tsi(i, j) = 


1+ | T@OT(J) 


2 
IT) 


tsi(j,i) = 


Nodes with dense connections showcase higher 
tsi values compared to those with sparse connec- 
tions. This emphasizes that nodes within a com- 
munity exhibit elevated tsi values among them- 
selves while minimizing their influence on nodes 
outside the community. 
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(1) Set the iteration number t = 1. 


(2) For every node x € X, update the node label 
iteratively following either formula (1) or (2). 
During this iterative process, each node adopts 
the label held by the largest number of its neigh- 
boring nodes. When multiple labels become 
equally prevalent, LPA randomly selects one of 
them to update the node label. Label updating 
methods can be categorized into either synchro- 
nous or asynchronous approache. The synchro- 
nous update method involves updating the label 
of node x in the t" iteration based on the label of 
the adjacent nodes at the (t-1)" iteration. The 
formula is as follows: 


c(t) = f(ex,(t-1),..5.€,(¢-1),x,e N(x) ~~ @) 
Where c,(t) represents the label of node x in the 
iteration t, and N(x) represents the neighborhood 
set of node x. Synchronous updating may lead to 
oscillation phenomena in a near-structure net- 
work, and asynchronous updating provides a 
viable solution to this issue [17]. 

In asynchronous updating, the label of a node x 
in the t iteration is modified using a combina- 
tion of labels from two sources (i) a portion of 
labels from its adjacent nodes, which have al- 
ready been updated in the current iteration, (11) 
another portion of labels from the (t-1)" itera- 
tion, which have not yet been updated in the cur- 
rent iteration. 


The formula is as follows: 


Cy) = FC EDs es Com 6 -— DO +5 Ce O 
x, € N(x) 
(4) 
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Figure 2: Flow chart of ECDLPA-ACO 


Step2: Evaluating Communities 

The process of ants visiting unvisited vertices 
through paths of similar vertices to construct the 
set of communities, based on their significance 
for each community, is iteratively performed 
until all vertices are labeled. Following this, the 
pheromones of paths traversed by ants are updat- 
ed. The evaluation of the set of communities dis- 
covered by the ants is done using the modularity 
measures and it proceeds as follows: 


Kk. 
26) = —¥ (4, - EH 6(e,¢,) 
me m (5) 
Where A indicates the adjacency matrix of the 
input network, with Aij being one when vertex ci 
is connected to vertex cj and zero otherwise. m 
represents the total number of edges in the input 
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network, ki denotes the degree of vertex ci, and 
d(ci, cj) is the delta function, yielding 1 if vertex 
ci and vertex cj are in the same community and 0 
otherwise. If the new modularity value surpasses 
the average of all previously obtained modularity 
values, the pheromones are increased; otherwise, 
they are decreased. After estimating the tsi val- 
ues between nodes, distinctive initial labels are 
assigned to each node in the network. In this uni- 
form label initialization process, each node's la- 
bel is adjusted to match the label of its neighbors 
based on the calculated tsi values (4). 


init ar 
ye" = arg max/(/,,/,).f(, J) 
ier(/) 
Where /; denotes the initial label of node j, f(i, j) 
is a function that returns | if tsi(i, j) 1s greater 
than tsi(j, i) and tsi(i, j) is less than L, where L 
denotes the label. Equation (6) can also be inter- 
preted as every node in the network attempting 


to update the label of each of its neighbor nodes 
with its label if f(i, }) is satisfied. 


(6) 


Step 3: Stopping conditions 

The cycle of calculating transition probabilities, 
ant traversal of vertices, label propagation, as- 
sembly of a candidate community set, assess- 
ment of the derived communities, and phero- 
mone updates continues until further improve- 
ment is unattainable or the algorithm's iteration 
count surpasses a predefined limit. 


4. EXPERIMENTAL RESULTS 

In this study, scalability, average execution 
time, modularity, and computational efficiency 
were employed as the evaluation metrics widely 
recognized for assessing the quality of detected 
communities. Modularity is a particularly useful 
metric for quantifying the shared information 
between two distinct network partitions. To as- 
sess the algorithm's effectiveness, one can com- 
pare the discovered partition to a known real 
partition when the network's community struc- 
ture is known. 


In this work, the performance of ECDLPA-ACO 
was compared with Louvain algorithm, Infomap 
algorithm and LPA on three social network da- 
tasets (Twitter, Ego-Gplus, Ego-Facebook). The 
experiments were conducted on a computer with 
a 3.4 GHz Intel Core 17 CPU and 16.0 GB of 
RAM, implementing the algorithms in Python 
using NetworkX. 


memes 


Journal of Theoretical and Applied Information Technology — 
15 May 2024. Vol.102. No 9 WwW. 
© Little Lion Scientific 


2. = 
Srila 


ISSN: 1992-8645 www. jatit.org E-ISSN: 1817-3195 


Datasets: 

1. Twitter: Consists of 'circles' from public 
sources, including node features, circles, and ego 
networks. 

2. Ego-Gplus: Encompasses_ 'circles' from > 
Google+, collected manually through the 'share 4 
circle' feature, with node features, circles, and 
ego networks. 

3. Ego-Facebook: Comprises 'circles' from Face- 
book, collected via a survey. It includes node 
features, circles, and ego networks. For privacy, 
Facebook-internal ids have been replaced, and 
feature interpretation is obscured. 

Scalability: | The scalability of the proposed 
algorithm was compared with others using the 
Ego-Facebook, Twitter, and Ego-Gplus datasets. 


These datasets include a substantial number of Figure 3: c) Initial label node 
nodes (n = 10,000), an average degree of k = 20, 
and community sizes within the range [minc, 
maxc] =[100, 500]. The scale-up metric acts as a 
benchmark to evaluate the efficiency of the par- 
allel algorithm in handling larger datasets. 


*Communities 
1={1,2,3,4,5,6} 
14 ={7,8,9,10, 11} 


48=f12,13,14,15,16,17,18} 


Figure 3: d) Final communities 


When more iteration occur, which can be defined 
as 


scaleup = 


3s 


Where T; denotes the Sequential Execution 
Time of the algorithm for processing the given 
dataset on a single node, and T, represents the 
Parallel Execution time of the algorithm for han- 
dling datasets p times larger on p time larger 
nodes. 


Figure 3: b) Influence as weights to each edge 
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Table 1: Comparison of scalability for 3 datasets range of node numbers n ranging from 100,000 
to 500,000. As shown in Figure 5, it becomes 
evident that the runtime of ECDLPA-ACO fol- 
lows a linear scaling pattern with the dataset size 
and is notably faster in comparison to the Lou- 
vain, Infomap, and Label Propagation Algo- 
rithms. 


Table 2: Comparison of Average Running Time for 3 
12.4 10.2 WD 5.4 datasets 


Algo- Louvain | Infomap | LPA ECDLP 
14.7 es) 8 8.9 rithms A-ACO 


Ego- ISS 9.414 5.918 3.113 
Facebook 


on Geen Baa ee 
TRO 22.123 18.245 13.451 8.219 


ECDLPA-ACO efficiently completes the com- 


To validate the scalability of ECDLPA-ACO, the 
dataset size was increased from 100000 to 
500000 data points. In Figure 4, the results high- 
light that ECDLPA-ACO exhibits superior 
scalability values compared to other algorithms. 
Both the number of iterations and dataset size 
demonstrate proportional growth. The proposed 
ECDLPA-ACO algorithm showcases outstand- 
ing scalability and adaptability when dealing 
with large datasets. 


BA owen munity detection process within less than 10.4 
— Hed seconds for a network containing 500,000 nodes, 
WMA ECDLPA-ACO showcasing both its speed and scalability. Con- 


sequently, ECDLPA-ACO outperforms the other 
tested algorithms in accurately identifying genu- 
ine community structures within the networks. 
BZ Louvain 
HB Infomap 


WLPA 
MMMBECDLPA-ACO 


Scaleup [Execution Time(Sec)] 


Ego-Facebook Twitter Ego-Gplus 


Figure 4: Scalability 


Running Time (Sec) 


Average Running Time: In Figure 5, the aver- 
age running time of the proposed ECDLPA- 
ACO algorithm is plotted, demonstrating its su- 
perior speed compared to the Louvain, Infomap, Ego-Facebook Twitter Ego-Gplus 
and Label Propagation Algorithms. 


Figure 5: The average running time 
The average execution times in seconds for the 
Louvain Algorithm, Infomap Algorithm, Label The modularity of Communities: 
Propagation Algorithm, and ECDLPA-ACO on 
Twitter data, with parameters k=40 and [minc, 
maxc]=[200, 1000], were measured across a 


One key metric for evaluating the quality of a 
community partition is referred to as "Modulari- 
ty." In the context of an undirected graph G(E, 
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V), where each node is assigned to one of C po- 
tential communities, the modularity of the parti- 
tion is defined as follows: 


= —— a. [ A 
Where, a are ive sieinents a the adjacency ma- 
trix of G(E, V), ki 1s the out-degree of node I, m 
= |E| 9(c; , cj ) is equal to 1 if i and j belong to 
the same community, and is equal to 0 otherwise. 


S115 (6, i) 


Table 3: Comparison of modularity for 3 datasets 


Louvain 
Algorithms 


Ego- 8.4 
Facebook 


Twitter 16.23 14.27 


In this study, the modularity metric is employed 
as a measure to assess the effectiveness of the 
generated communities. The modularity metric 
results are visually represented in figure 6. 


Modularity of Communities 


Ego-Gplus 


Ego-Facebook Twitter 


Figure 6: Modularity of Communities 


The findings suggest that the proposed algorithm 
consistently produces commendable modularity 
values, establishing its efficacy and superiority 
over alternative algorithms in community detec- 
tion. The ECDLPA-ACO algorithm consistently 
outperforms, exhibiting values ranging from 
3.113 to 8.219. In contrast, Louvain, Infomap, 
and LPA exhibit less performance compared to 
the proposed algorithm. 
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Computational Time: The comparison in- 
volved the ECDLPA-ACO algorithm against the 
Louvain, Infomap, and Label Propagation algo- 
rithms. As illustrated in Figure 7, the proposed 
method consistently outperformed the baseline 
algorithms across all datasets and networks, 
showcasing superior performance, particularly in 
larger networks with tens of millions of nodes. 
Notably, the Louvain, Infomap, and Label Prop- 
agation algorithms struggled to handle datasets 
of such magnitude. 


Table 4: Comparison of Computational Time for 3 
datasets 


Algo- 
rithms 


Info- 
map 


LPA 


Ego- 
Facebook 


10.21 | 7.24 5.14 


14.25 12.44 O23 5292 


Furthermore, when tasked with detecting com- 
munities of the same size, the ECDLPA-ACO 
algorithm exhibited a notable advantage in 
speed, surpassing the compared methods by two 
to three orders of magnitude. This speed ad- 
vantage stems from the ECDLPA-ACO algo- 
rithm's reduced computation times per iteration. 
Additionally, its computational efficiency is en- 
hanced as it calculates the modularity gain of a 
single node only in the two communities where 
the movement occurs. This sets the ECDLPA- 
ACO algorithm apart as one of the fastest and 
most efficient overlapping community detection 
algorithms available. 
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HB Louvain 
HR Infomap 
LPA 


BB ECDLPA-ACO 


Computational Time (Sec) 


Ego-Facebook Twitter Ego-Gplus 


Figure 7: Computational Time 
5. CONCLUSION 


ECDLPA-ACO is an enhanced Label Propaga- 
tion Algorithm, incorporated with Ant Colony 
Optimization to improve community modularity. 
The proposed algorithms eliminates the observed 
problems and it outperforms other algorithms 
viz., Louvain, Info map and Label Propagation 
Algorithms in terms of scalability, execution 
time, modularity, and computational efficiency, 
as demonstrated in experiments on social net- 
work datasets. 


Future research avenues may explore the adapta- 
bility of ECDLPA-ACO for community detec- 
tion in dynamic networks, where community 
structures evolve over time. Strategies for updat- 
ing communities as the network undergoes 
changes could also be developed. Additionally, 
the extension of ECDLPA-ACO to support mul- 
ti-resolution community detection would enable 
the algorithm to identify communities at various 
levels of granularity within a network, particular- 
ly beneficial for analyzing hierarchical networks. 
Further exploration could involve hybrid ap- 
proaches, combining ECDLPA-ACO with other 
community detection algorithms or machine 
learning techniques, to capitalize on their com- 
plementary strengths. 
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